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estimator for p is given as well. Our results demonstrate that our estima- 
tor behaves very much like the kernel type deconvolution estimator in the 
classical deconvolution problem. 
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1. Introduction 

Let X\, . . . , X n be i.i.d. copies of a random variable X = Y + aZ, where Xj = 
Yi + aZi 1 and Yi and Z\ are independent and have the same distribution as Y 
and Z, respectively. Assume that Y's are unobservable and that Y = UV, where 
U and V are independent, U has a Bernoulli distribution with probability of 
zero equal to p (we assume that < p < 1) and V has a distribution function F 
with density /. Furthermore, let the random variable Z have a standard normal 
distribution and let a be a known positive number. The X will then have a 
density, which we denote by q. The distribution of Y is completely determined 
by / and p. Note that the distribution of Y has an atom at zero. Based on a 
sample X\, . . . ,X n , we consider the problem of (nonparametric) estimation of 
the density / and the probability p. 

Our estimation problem is closely related to the classical deconvolution prob- 
lem, where the situation is as described above, except that in the classical case 
p vanishes and Yi has a continuous distribution with density /, which we want 
to estimate. The Yj's can for instance be interpreted as measurements of some 
characteristic of interest, contaminated by noise aZj. Some works on deconvo- 
lution include [3, 4, 6, 7, 9, 10, 11, 13, 14, 16, 19, 20, 21, 22, 23, 28, 30, 32, 
35, 38, 39, 42, 43, 45, 46] and [50]. Practical problems related to deconvolution 
can be found e.g. in [31], which provides a general account of mixture mod- 
els. The deconvolution problem is also related to empirical Bayes estimation of 
the prior distribution, see e.g. [2] and [33]. Yet another application field is the 
nonparametric errors in variables regression, see [24] . 

Unlike the classical deconvolution problem, in our case Y does not have a 
density, because the distribution of Y has an atom at zero. Hence our results, 
apart of the direct applications below, will also provide insight into the robust- 
ness of the deconvolution estimator when the assumption of absolute continuity 
is violated. 

One situation where the atomic deconvolution can arise, is the following: 
one might think of the X^s as increments X, — Xj_i of a stochastic process 
X t = Y t +erZ t , where Y = (Y t ) t >o is a compound Poisson process with intensity 
A and jump size density p, and Z = (Z t ) t >o is a Brownian motion independent 
of Y. The distribution of Yj — Y,_i then has an atom at zero with probability 
equal to e~ A , while Zj — Zj_i has a standard normal distribution. Notice that 
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X = (X t ) t >o is a Levy process, see Example 8.5 in [37]. An exponential of the 
process X can be used to model the evolution of a stock price, see [34]. The law 
of X can be completely characterised by /, A and a. Furthermore, estimation 
of / in the atomic deconvolution context is closely related to estimation of the 
jump size density of a compound Poisson process Y, which is contaminated by 
noise coming from a Brownian motion, see [26]. 

Another practical situation might arise in missing data problems. Suppose 
for instance that a measurement device is used to measure some quantity of 
interest and that it has a fixed probability p of failure to detect this quantity, in 
which case it renders zero. Repetitive measurements can be modelled by random 
variables Yi defined as above. Assume that our goal is to estimate the density 
/ and the probability p. In practice measurements are often contaminated by 
an additive measurement error and to account for this, we add the noise aZi 
to our measurements (cr quantifies the noise level). If we could directly use the 
measurements Yi , then the zero measurements could be discarded and we would 
have observations with density / to base our estimator on. However, due to the 
additional noise o~Zi, the zeroes cannot be distinguished from the nonzero Y^s. 
The use of deconvolution techniques is thus unavoidable. The same situation 
occurs for instance when Yi are left truncated at zero. In the error-free case, i.e. 
when (7 = 0, estimation of the mean and variance of a positive random variable 
V was considered in [1]. Our model appears to be more general. 

In what follows, we first assume that p is known and construct an estimator 
for /. After this, in the model where p is unknown, we will provide an estimator 
for p and then propose a plug-in type estimator for /. An estimator for / will 
be constructed via methods similar to those used in the classical deconvolution 
problem. In particular we will use Fourier inversion and kernel smoothing. Let 
4>x, 4>y an d 4>f denote the characteristic functions of the random variables X, Y 
and V, respectively. Notice that the characteristic function of Y is given by 

<M*)=p + (!-;#/(*)■ (i-i) 

Furthermore, since 

Mt) = 4>Yit)e-^ t2 / 2 = (p+(l-p)<t> f (t))e-° 2t2 /\ 

the characteristic function of V can be expressed as 

fo(f)-pe- g2 ' 2 / 2 
M*)= (1 _ p)e -oHV_ • 

Assuming that 0/ is integrablc, by Fourier inversion we get 

1 r .-ustxW-pe-W 



/(aO = — / c~ — _£_ — _____ — q 1 ^ (i.2) 
M ^ ^ Loo (l-p)e— 2 * 2 /2 ^ > 

An obvious way to construct an estimator of f(x) from this relation is to estimate 
the characteristic function <px{t) by its empirical counterpart, 



1 - 



itXi 

n 

3=1 
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see e.g. [25] for a discussion of its applications in statistics, and then obtain the 
estimator of / by a plug-in device. Alternatively, one can estimate the density 
q of X by a kernel estimator 

i=i v 

where w denotes a kernel function and h > is a bandwidth. Denote by (j> w 
the Fourier transform of the kernel w. The characteristic function of q n h, which 
is equal to <j> e mp(t)4>w(ht), will serve as an estimator of 4> q , the characteristic 
function of q. A naive estimator of / can then be obtained by a plug-in device, 
and would be 

1 f°° „- ltx <t>e mp (t)Mht)-pe-° 2t2 / 2 



However, this procedure is not always meaningful, because the integrand in (1.3) 
is not integrable in general. Therefore, instead of (1.3), we define our estimator 
of / as 

1 r „_ lte emp (t)- P e- 2t2 /2 



u{x) = ^ j ^ e- nx ^;_ p)e ':^ /2 MM)*, (i.4) 

where the integral is well-defined under the assumption that 4>w has a compact 
support on [—1,1]. Notice that 

fnh{x) = i?M-P Wh{x)) (i.5) 
i — p i — p 

where 

^^/i^ ^y * (i - 6) 

and Wh{x) = (l/h)w (x/ti) . Hence f n h has the same form as an ordinary decon- 
volution kernel density estimator based on the sample X±, . . . ,X n , see e.g. pp. 
231-232 in [49]. 

Under the assumption of integrability of 0/ and some additional restrictions 
on w, the bias of the estimator (1.4) will asymptotically vanish as h — > 0. Indeed, 

1 r°° 

E[f nh (x)]-f(x) = — e- Ux 4> f {t){<t> w {ht)-l)dt. (1.7) 

The result follows via the dominated convergence theorem, once we know that 
4> w is bounded and 0(0) = 1. Observe that (1.7) coincides with the bias of an 
ordinary kernel density estimator based on a sample from /. In case we know that 
/ belongs to a specific Holder class, it is possible to derive an order bound for 
(1.7) in terms of some power of h, see Proposition 1.2 in [40]. Further properties 
of kernel density estimators can be found in [15, 17, 36, 40, 48] and [49]. 

Estimation of p is not as easy, as it might appear at first sight. Indeed, due 
to the convolution structure X = Y + aZ, the random variable X has a density 



B. van Es et al./Deconvolution for an atomic distribution 



269 



and the atom in the distribution of Y is not inherited by the distribution of X. 
On the other hand p is identifiable, since 

because </>/(t) — > as £ — ► oo by the Riemann-Lebesgue theorem. However, this 
relation cannot be used as a hint for the construction of a meaningful estimator 
of p because of the oscillating behaviour of emp (i), the obvious estimator of 
4>x{t), as t — > oo. 

As an estimator of p we propose 



(t)Mgt) 



Pn9 ~ 2 J_ 1/g e- 2 * 2 /2 



dt. 



where the number g > denotes a bandwidth and <pk denotes the Fourier 
transform of a kernel k. We assume that <pk has support [—1, 1]. The definition 
of p ng is motivated by the fact that 

lim- / _ g 2 t 2 /2 ^ = lim - / (j)Y(t)dt 



lim I [ 1/9 (p+(l-p)Mt))dt 



9-0 2 

p 



-1/9 



Assuming the integrability of <^>/, the last equality follows from 

"1/9 



|0/(t)|<ft < / \<l>f(t)\dt < oo. 

-l/g J-oo 

Finally, let us consider the general case when both p and / are unknown. 
Plugging in an estimator of p into (1.4) leads to the following definition of an 
estimator of /, 



J_ f°° -itx <t>emp{t) -p n 
2W-oo (l-pn S )e- 2 * 2 /2 



where 



png = min(p„ s , 1 - e n ). (1-10) 

Here < e n < 1 and e n | at a suitable rate, which will be specified in 
Condition 1.5. The truncation of p ng in (1.10) is introduced for technical reasons, 
see formula (5.18), where we need that the random variable 1 — p ng is bounded 
away from zero. 

In practice it might also happen that the error variance a 2 is unknown and 
hence has to be estimated. This is a difficult problem in the classical deconvo- 
lution density estimation if only observations X\, . . . ,X n are available, as the 
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convergence rate for estimation of a is not the usual \fn rate, see e.g. [35]. More- 
over, the convergence rate of an estimator of a would dominate the asymptotics. 
If additional measurements are available, then as suggested for instance in [9], 
<7 can be estimated e.g. via the empirical variance of the difference of replicated 
observations or by the method of moments via instrumental variables. A recent 
paper on this subject is [14]. We do not pursue this question any further and 
assume that a is known. 

Concluding this section, we introduce some technical conditions on the den- 
sity /, kernels w and k, bandwidths h and g and the sequence e„. These are 
needed in the proof of Theorem 2.5, the main theorem of the paper, and sub- 
sequent results. Weaker forms of these conditions are sufficient to prove other 
results from Section 2 and will be given directly in the corresponding statements. 

Condition 1.1. There exists a number^ > 0, such that u^fifiu) is integrable. 

Condition 1.2. Let <p w be bounded, real valued, symmetric and have support 
[—1, 1]. Let <p w (0) = 1 and let 

<t> w {l-t) = At a + o(t a ), as U0 (1.11) 

for some constants A and a > 0. Moreover, we assume that 7 > 1 + 2a. 

This condition is similar to the one used in [30] and [46] in the classical 
dcconvolution problem. An example of a kernel that satisfies this condition is 



w(x) = — 



4\/2(3x cos x + (— 3 + a; 2 ) sin x) 



7TX 



Its Fourier transform is given by 

<M0 = (i-* 2 ) 2 i[-i.i]M- 



(1.12) 



(1.13) 



In this case a = 2 and A = 4. The kernel (1.12) and its Fourier transform are 
plotted in Figures 1 and 2. 




/ 0.8 




/ 0.6 




/ 0.4 




/ 02 





Fig 1. The kernel (1.13). 



Fig 2. The Fourier transform of the ker- 
nel (1.13). 
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Condition 1.3. Let 4>k be real valued, symmetric and have support [—1, 1]. Let 
(f>k integrate to 2 and let 



4>k(t) = Bf + o(P), 
6 fc (l — t) =Cf+o(f), 



(1.14) 
(1.15) 



as t I 0. Here B and C are some constants, and 7 and a are the same as above. 
An example of such a kernel is given by 



k{x) = - 



2079a;(-151200 + 21840a; 2 - 730a; 4 + 7a; 6 ) cosx 
V2irx n 

693(453600 - 216720a; 2 + 13950a; 4 - 255a; 6 + a; 8 ) sin a; 



11 



Its Fourier transform is given by 



693 



&(t) = - 5 -t 6 (l-m [ -i,i](*)- 



(1.16) 



(1.17) 



In this case B = 693/8, 7 = 6, a = 2 and C = 693/2. The kernel (1.16) and 
its Fourier transform are plotted in Figures 3 and 4. Condition (1.14) is only 
needed when p ng is plugged into f* hg , but not if p ng is used as an estimator 
of p. 

Condition 1.4. Let the bandwidths h and g depend on n, h = h n and g = g n , 

and let 

h n = o-{{\ + ri n )\ogn)- 1/2 , 
g n = a((l + S n )logn)- 1 / 2 , 

where rj n and S n are such that r\ n \ 0, 6 n J. 0, r\ n ~ S n > 0, and 

(Vn - S„) log n -> 00. 






Fig 3. The kernel (1.17). 



Fig 4. The Fourier transform of the ker- 
nel (1.17). 
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Furthermore, we assume that 



T} n log n + (1 + 2a) log log n — > oo, 
<5„ log 77- + (1 + 2a) log log n — > oo. 



(1.18) 



An example of ?7„ and S n in the definition above is 



log log log n 




log log log n 



r\ n = 



log n 



Conditions on the bandwidths h n and g n in Condition 1.4 are not the only 
possible ones and other restrictions are also possible. However the logarithmic 
decay of h n and g n is unavoidable. Following the default convention in kernel 
density estimation and to keep the notation compact, we will suppress the index 
n when writing h n and g n and will write h and g instead, since no ambiguity 
will arise. 

Condition 1.5. Let e n j be such that 



An example of such e n for rj n and 5 n given above is (log log log n) -1 . 

The remainder of the paper is organised as follows: in Section 2 we derive 
the theorem establishing the asymptotic normality of f n h(x), the fact that the 
estimator p ng is weakly consistent, and finally that the estimator f* hg (x) is 
asymptotically normal. Section 3 contains simulation examples. Section 4 dis- 
cusses a method for implementation of the estimator in practice. All the proofs 
are collected in Section 5. 

2. Main results 

We will first study the estimation of / when p is known, and then proceed to 
the general case with unknown p. The reason for this is twofold. Firstly, it is 
interesting to compare the behaviour of the estimator of / under the assumption 
of known and unknown p, and secondly, the proofs of the results for the latter 
case rely heavily on the proofs for the former case. 

The first result in this section deals with the nonrobustness of the estimator 
f n h- In ordinary kernel deconvolution, when it is assumed that Y is absolutely 
continuous, the estimator for its density is defined as 



log e„ 



0. 



(v?n - S n )\ogn 




{t)K{ht) 

e -<T 2 t 2 /2 



dt. 



(2.1) 



Now suppose that the assumption of absolute continuity of Y is violated. What 
will happen, if we still use the estimator f n h{x)l The following result addresses 
this question. 
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Theorem 2.1. Let f n h{x) be defined as in (2.1). Assume that <p w is bounded 
and has a compact support on [—1,1]. Then 

E [fnh(x)] = pw h (x) + (1 - p)f * w h (x), (2.2) 

where u>/i(0 = (l/h)w(-/h), and * denotes convolution. 

From this theorem it follows that E[/„/j(0)] diverges to infinity as h — > 0, 
because so does ft, _1 w(0), if w(0) ^ (the latter is the case for the majority of 
conventional kernels). In practice this will also result in an equally undesirable 
behaviour of E[f n h(x)] in the neighbourhood of zero. When x ^ 0, with a 
proper selection of a kernel w, one can achieve that the first term in (2.2) 
asymptotically vanishes as h — > 0. Indeed, it is sufficient to assume that w is such 
that liniti^ioo uw(u) = 0. The second term in (2.2) will converge to (1 —p)f(x) 
as h — ► 0, provided that </>/ is integrable, 4> w is bounded and (p w {0) = 1. These 
facts address the issue of the nonrobustness of f n h- under a misspecified model, 
i.e. under the assumption that the distribution of Y is absolutely continuous, 
while in fact it has an atom at zero, the classical deconvolution estimator will 
exhibit unsatisfactory behaviour near zero. This will happen despite the fact 
that f n h(x) will be asymptotically normal when centred at its expectation and 
suitably normalised, see Corollary 5.1 in Section 5. The asymptotic normality 
follows from Lemmas 5.2 and 5.3 of Section 5, where only absolute continuity 
of the distribution of X is required. 

Our next goal is to establish the asymptotic normality of the estimator 
fnh{x). We formulate the corresponding theorem below. 

Theorem 2.2. Assume that 4>f is integrable. Let E [X 2 ] < oo, and suppose that 
Condition 1.2 holds. Let f n h be defined as in (1-4). Then, as n — > oo and h — > 0, 

/ A2 / i \ 2+2a \ 

^('-SFfb) <r( a + i» 2 ), (2.3) 

where T denotes the gamma function, T(t) — v t ~ 1 e~ v dv. 

Note that Theorem 2.2 establishes asymptotic normality of f„h under an 
atomic distribution, which constitutes a generalisation of a result in [46] (see 
also [45]) for the case of the classical deconvolution problem. The generalisation 
is possible, because the proof uses only the continuity of the density of X, 
which is still true when Y has a distribution with an atom. Furthermore, notice 
that in order to get a consistent estimator, from this theorem it follows that 
- v /n/i -1-2Q e -<T ^ 2h ' has to diverge to infinity. Therefore the bandwidth h has 
to be at least of order (logn) -1 / 2 , as it is actually stated in Condition 1.4. In 
practice this implies that the bandwidth h has to be selected fairly large, even 
for large sample sizes. This is the case for the classical deconvolution problem 
as well in the case of a supersmooth error distribution, cf . [46] . 
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Observe that the asymptotic variance in (2.3) does not depend on the target 
density / nor on the point x. This phenomenon is quite peculiar, but is already 
known in the classical deconvolution kernel density estimation, see for instance 
equation (6) in [3]. There, provided that h is small enough, the asymptotic vari- 
ance of the deconvolution kernel density estimator (or, strictly speaking an upper 
bound for it) also does not depend neither on the target density /, nor on the 
point X. In this respect see also [46]. Such results do not contradict the asymp- 
totic normality result in [21], see Theorem 2.2 in that paper, as there the asymp- 
totic variance of the deconvolution kernel density estimator is not evaluated. 

Now we state a theorem concerning the consistency oip ng , the estimator of p. 

Theorem 2.3. Assume that tfif is integrable, let E [X 2 ] < oo, and let the kernel 
k have a Fourier transform <pk that is bounded by one and integrates to two. Let 
p ng be defined as in (1.8). If g is such that g 4+4a e a I 9 n~ x — > 0, then p ng is a 
consistent estimator of p, i.e. 

P(\p ng -p\>e)^Q 

as n — > and g — > 0. Here e is an arbitrary positive number. Furthermore, under 
Condition 1.1 and 1.3 

( 4+4a a 2 /g 2 \ 

E[(p ng -p) 2 } = g 2 + 2 "< + 9 - . 



One can also show that p ng is asymptotically normal, when centred and 
suitably normalised. We formulate the corresponding theorem below. 

Theorem 2.4. Assume that the conditions of Theorem 2.3 hold. Let p ng be 
defined as in (1.8) and let (1.15) hold. Then 

2("rCl _l / 1 \ 2+2q\ 



( Png -V[p ng ])^ N 0, 



v ,r L C\T{l+a)f ( 1 



as n — > 00 and g — ► 0. 

Finally, we consider the case when both p and / are unknown. We state the 
main theorem of the paper. 

Theorem 2.5. Let f* hg (x) be defined by (1.9), E [X 2 ] < 00 and let Condi- 
tions 1.1-1.5 hold. Then, as n — > 00, we have 



h 1+2 "e 
v 



a p ^H2b?) (fnhg ( X ) E ifnhg ( x )] ) 



A 2 ( 1 x 2+2a 



" I ' 2^1=10* U*) (r(a+1)r 



Notice that the asymptotic variance is the same as in (2.3), which justifies the 
plug-in approach to the construction of an estimator of /, when p is unknown. 
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A natural question to consider is what happens when we centre fnhg( x ) n °t 
at its expectation, but at f(x). This has practical importance as well, e.g. for 
the construction of (asymptotic) confidence intervals. Writing 

^l+2a e a 2 /(2/i 2 ) (fnhg( x ) ~ f( x )) = ^l+2a e cr 2 /(2/i 2 ) (fnhg( x ) ~ E [fnhg ( x )] ) 

+ h l+2c^/(2h^) ( E t/nhflC 31 )] ~ /(»))) 

we see, that we have to study the second term here, i.e. to compare the be- 
haviour of the bias of fnh g ( x ) to the normalising factor y / nh~( 1+2a )e~ a ^ 2h '. 
We will study the bias of fnh g ( x ) m two steps: first we will show that it 
asymptotically vanishes, which itself is of independent interest. After this we 
will provide conditions under which it asymptotically vanishes when multiplied 
by ^Jnh~( 1+2a ' 1 e~ a ^ 2h >. Recall the definition of a Holder class of functions 
H((3,L). 

Definition 2.1. A function f is said to belong to the Holder class 7i((3,L), if 
its derivatives up to order I = [/3] exist and verify the condition 

\fW(x + t)-f®(x)\<L\tf- 1 

for all x,i6K. 

Such a smoothness condition on a target density / is standard in kernel 
density estimation, see e.g. p. 5 of [40]. Often one assumes that [3 = 2. If I = 0, 
then set f^ = f. We also need the definition of a kernel of order I. In particular, 
we will use the version given in Definition 1.3 of [40]. 

Definition 2.2. A kernel w is said to be a kernel of order I for I > 1, if the 

functions x i— > x J w{x) are integrable for j = 0, . . . , I and if 

/oc 
x^w{x)dx = for j = 1, . . . , I. 
-oo 

Theorem 2.6. Let f* hg (x) be defined by (1.9) and assume conditions of The- 
orem 2.5. Then, as n — * oo, we have 

E[/4 s (z)]-/Or)^0. 

If additionally f £ 7i(/3, L), w is a kernel of order I = [[3] and (3 > 1 + 2a, then 

/l l+2a e aV(2^) ( E ifnhg( x )} - f( x )) ~» 

as n — > oo. 

Combination of this theorem with Theorem 2.5 leads to the following result. 

Theorem 2.7. Assume that the conditions of Theorem 2.6 hold. Then, as n —* 
oo, we have 

,2 / i \ 2+2q \ 



1 l+2a^ 2 /(2/ l 2 ) (fnhg( x ) f( X )) ^ N f°> 2 7T 2 (1 -p) 2 ( a 2 ) 



-o (T(a + 1)) 



2 
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One should keep in mind that these results deal only with asymptotics. In 
the next section we will study several simulation examples, which will provide 
some insight into the finite sample properties of the estimator. 

3. Simulation examples 

In this section we consider a number of simulation examples. We do not pretend 
to provide an exhaustive simulation study, rather an illustration, which requires 
further verification. 

Assume that a = 1, p = 0.1 and that / is normal with mean 3 and variance 9. 
This results in a nontrivial deconvolution problem, because the ratio of 'noise' 
compared to 'signal' is reasonably high: NSR = Vax[crZ]/ Var[y]100% ~ 11%. 
We have simulated a sample of size n — 1000. As kernels w and k we selected 
kernels (1.12) and (1.16), respectively. The bandwidths h = 0.58 and g = 0.5 
were selected by hand. A possible method of computing the estimate is given 
in Section 4. The estimator p ng produced a value equal to 0.11. The estimate 
of / (bold dotted line), resulting from the procedure described above, together 
with the target density / (dashed line) is plotted in Figure 5. For comparison 
purposes, we have also plotted the estimate f n h(x) (it can be obtained using 
(1.5) and the true value of the parameter p), see Figure 6. As can be seen from 
the comparison of these two figures, the estimates f* hg and f n h look rather 
similar. 

As the second example we consider the case when / is a gamma density with 
parameters a — 8 and (3 = 1, i.e. 



and p = 0.25. We simulated a sample of size n = 1000. The kernels were chosen 
as above and the bandwidths g = 0.6 and h = 0.6 were selected by hand. The 
estimate p ng took a value approximately equal to 0.23. The resulting estimate 




(3-1) 




0.14 
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Fig 5. The normal density f (dashed line) 
and the estimate f* hg (solid line). The 
sample size n = 1000. 



Fig 6. The normal density f (dashed line) 
and the estimate f n h (solid line). The sam- 
ple size n = 1000. 
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Fig 7. The gamma density (dashed line) 
and the estimate f* hg (solid line). The 
sample size n = 1000. 



Fig 8. The gamma density (dashed line) 
and the estimate f n ^ (solid line). The sam- 
ple size n = 1000. 



Fig 9 . The histogram of estimates of p for 
g = 0.5 and the sample size n = 1000. 



Fig 10. The histogram of estimates of p for 
g = 0.55 and the sample size n = 1000. 



fnhg 1S plotted in Figure 7. As above we also plotted the estimate f n h, see 
Figure 8 (notice that the estimate takes on negative values in the neighbourhood 
of zero). Again both figures look similar. 

Examination of these figures leads us to two questions: how well does p ng 
estimate p for moderate sample samples? How sensitive is f* hg to under- or 
overestimation of pi To get at least a partial answer to the first question, we 
considered the same model as in our first example in this section (i.e. decon- 
volution of the normal density) and repeatedly, i.e. 1000 times, estimated p for 
the bandwidth g = 0.5 and the sample size n = 1000 for each simulation run. 
Then the same procedure was repeated for the bandwidths g = 0.55, 0.6 and 
0.65. The resulting histograms are plotted in Figures 9-12. They look quite sat- 
isfactory. The sample means and sample standard deviations (SD) of estimates 
of p for different choices of bandwidth g together with the theoretical standard 
deviations are summarised in Table 1 . One notices that the sample means in Ta- 
ble 1 are close to the true value 0.1 of the parameter p. The theoretical standard 
deviations in the same table were computed using Theorem 2.4, which predicts 
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Fig 11. The histogram of estimates of p for 
g = 0.6 and the sample size n = 1000. 



Fig 12. The histogram of estimates of p for 
g = 0.65 and the sample size n = 1000. 



Table 1 

Sample and theoretical means and standard deviations (SD) of estimates of p for different 
choices of bandwidth g. The sample size n = 1000 



Bandwidth 


0.5 


0.55 


0.6 


0.65 


Sample mean 


0.0963 


0.0975 


0.0960 


0.0927 


Sample SD 


0.0516 


0.0436 


0.0388 


0.0349 


Asymptotic SD 


1.7891 


2.2399 


2.8994 


3.8164 


Theoretical SD 


0.0700 


0.0593 


0.0487 


0.0432 



that they should be equal to (recall, that in our case a = 2) 

6 l/(2g 2 ) 

9 ■= CV2. 

From Table 1 one sees that there is a large discrepancy between the sample 
standard deviations and the standard deviations predicted by the theory. The 
explanation of this discrepancy lies in the fact that the proof of the asymptotic 
normality of p ng heavily relies on the asymptotic equivalence 

J 1 Ms)e a2s2/i29l) ds ~ CT(1 + a) (J^j + " ^d+^W), (3 . 2 ) 

see Lemma 5.1 and the proof of Lemma 5.2 in Section 5 below. However, by 
direct evaluation of the integral on the left-hand side of (3.2) for different values 
of g, it can be seen that this relation does not provide an accurate approximation 
in those cases where the bandwidth is relatively large, as it actually is in our 
case. It then follows that the asymptotic standard deviation will not provide 
a good approximation of the sample standard deviation unless the bandwidth 
is very small. This in turn implies that the corresponding sample size must 
be extremely large. We can correct for this poor approximation of the integral 
in (3.2) by using the integral itself as a normalising factor instead of the right- 
hand side of (3.2). The results of this correction are represented in the last 
line of Table 1. As it can be seen, the theoretical standard deviation and the 
sample standard deviation are much closer to each other. Since the kernel k was 
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Fig 13. The kernel (3.3). 



Fig 14. The Fourier transform of the ker- 
nel (3.3). 



selected more or less arbitrarily, one is tempted to believe that an inaccurate 
approximation in (3.2) might be due to the kernel. This might be the case, 
however to a certain degree this seems to be characteristic of all popular kernels 
employed in kernel deconvolution. Consider for instance the kernel 



The kernel w and its Fourier transform are plotted in Figures 13 and 14, respec- 
tively. This kernel was used for simulations in [22] and [47] and it was shown in 
[13] that it performs well in a deconvolution setting. Notice that this kernel can- 
not be used to estimate p if we want to plug in the resulting estimator p ng into 
fnhg- However, this kernel satisfies Condition 1.2 and can be used to estimate 
/. Nevertheless, the ratio of the left and right hand sides in (3.2) for h = 0.5 is 
equal to 0.4299, which is still far from 1. This issue is further discussed in [42]. 
Another issue here is that often the error variance a 1 is quite small and it is sen- 
sible to treat a as depending on the sample size n (with a — > as n — > oo), see 
[9]. However, this is a different model and this question is not addressed here. 
Notice also that a perfect match between the sample standard deviation and 
the theoretical standard deviation is impossible to obtain, because we neglect a 
remainder term when computing the latter. How large the contribution of the 
remainder term can be in general requires a separate simulation study. 

We also considered the case when the error term variance and the sample size 
are smaller (the target density / was again the standard normal density, while p 
was set to be 0.1). In particular, we took a = 0.3 and n = 500. The correspond- 
ing histograms are given in Figures 15-18, while the sample and theoretical 
characteristics for four different choices of the bandwidth g = 0.5, 0.55, 0.6 and 
0.65 arc summarised in Tabic 2. Notice a particularly bad match between the 
asymptotic standard deviation and its empirical counterpart. Other conclusions 
are similar to those in the previous example. 



w(x) 



48x(x 2 - 15) cos a; - 144(2x 2 - 5) sinx 



(3.3) 



Its Fourier transform is given by 



<M*) = (i-* 2 ) 3 i[i*i<i]- 
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0.025 0.05 0.075 0.1 0.125 0.15 0.175 



Fig 15. The histogram of estimates of p for 
g = 0.45 and the sample size n = 500. 



Fig 16. The histogram of estimates of p for 
g = 0.5 and the sample size n = 500. 



0.025 0.05 0.075 0.1 



Fig 17. The histogram of estimates of p for 
g = 0.6 and the sample size n = 500. 



Fig 18. The histogram of estimates of p for 
g = 0.65 and the sample size n = 500. 



To test the robustness of the estimator f* hg with respect to the estimated 
value of p, we again turned to the model that was considered in the first example 
of this section. Instead of p ng three different values p = 0.05, p = 0.1 and 
p = 0.15 were plugged in into (4.2). The resulting estimates f* hg are plotted 
in Figure 19 (the true density is represented by the dashed line). As one can 
see from Figure 19, under- or overcstimation of p in the given range does not 
have a significant impact on the resulting estimate f n h (of course one should 
keep in mind that p is relatively small in this case). On the other hand, if the 
value of p were larger, e.g. if p = 0.2, that would have a noticeable effect, e.g. 
it could have suggested bimodality in the case where the density is actually 
unimodal, see Figure 20 on the facing page. At the same time the simulated 
examples concerning the estimates p ng that we considered above seem to suggest 
that such instances of unsatisfactory estimates of p are not too frequent, because 
most of the observed values of p ng are concentrated in the interval [0.05, 0.15]. 
We also considered the case when / = O.50_2,i +0.502.1, where <j> XiV denotes the 
normal density with mean x and variance y. Hence in this case / is a mixture 
of two normal densities and it is also bimodal. The match is visually slightly 
worse for p = 0.2, but it is still acceptable. 
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Table 2 

Sample and theoretical means and standard deviations ( SD ) of estimates of p for different 
choices of bandwidth g. The sample size n = 500 



Bandwidth 


0.45 


0.5 


0.6 


0.65 


Sample mean 


0.0972 


0.0977 


0.0959 


0.0930 


Sample SD 


0.0277 


0.0269 


0.0283 


0.0295 


Asymptotic SD 


311.7 


562.3 


2247 


2521 


Theoretical SD 


0.0357 


0.0349 


0.0338 


0.0335 




Fig 19. The normal density f and esti- 
mates f* hg evaluated for p = 0.05, p = 0.1, 
p = 0.15 and the sample size n = 1000. 



Fig 20. The normal density f and estimate 
f* hg evaluated for p = 0.2 and the sample 
size n = 1000. 



The simulation examples that we considered in this section suggest that, 
despite the slow (logarithmic) rate of convergence, the estimator f* hg works in 
practice (given that p is estimated accurately) . This is somewhat comparable to 
the classical dcconvolution problem, where by finite sample calculations it was 
shown in [47] that for lower levels of noise, the kernel estimators perform well for 
reasonable sample sizes, in spite of slow rates of convergence for the supcrsmooth 
dcconvolution problem, obtained e.g. in [21] and [22]. However, Condition 1.4 
tells us, that the bandwidths h and g have to be of order (logn) -1 / 2 . In practice 
this implies that to obtain reasonable estimates, the bandwidths have to be 
selected fairly large, even for large samples. 

One more practical issue concerning the implementation of the estimator 
fnhg ( or Png) is the method of bandwidth selection, which is not addressed in 
this paper. We expect that techniques similar to those used in the classical 
dcconvolution problem will produce comparable results in our problem. This 
requires a separate investigation of the behaviour of the mean integrated square 
error of f* hg . In the case of the classical deconvolution problem papers that 
consider the issue of data-dependent bandwidth selection are [10, 11, 18, 28] 
and [39]. Yet another issue is the choice of kernels w and k. For the case of 
the classical deconvolution problem we refer to [13]. In general in kernel density 
estimation it is thought that the choice of a kernel is of less importance for the 
performance of an estimator than the choice of the bandwidth, see e.g. p. 31 in 
[48], or p. 132 in [49]. 
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Fig 21. The mixture of normal densities f 
and estimates f^ hg evaluated for p = 0.05, 
p = 0.1, p = 0.15. and the sample size n = 
1000. 



Fig 22. The mixture of normal densities f 
and estimate f^ hg evaluated for p = 0.2 
and the sample size n = 1000. 



4. Computational issues 

To compute the estimator f* hg in Section 3, a method similar to the one used 
in [44] (in turn motivated by [5]) can be employed. Namely, notice that 



where 



fnh(x) = fW(x) + fW(x), 



, 1 f°° „-itx <t>emp(t)(t>w(ht) ^ 

Jnh W - 2 7T Jo e -«Hy 2 at ' 

fV) M - 1 f°° o** ^mpi-t^jht) 



Using the trapezoid rule and setting vj = r)(j — 1), we have 

1 N 

where iV is some power of 2 and 



i/>(^) 



4>emp(Vj)(t>w(hVj) 
-cr 2 v 2 /2 

e 3 



The Fast Fourier Transform is used to compute values of at N different 
points (concerning the application of the Fast Fourier Transform in kernel de- 
convolution see [12]). We employ a regular spacing size 5, so that the values of 
x are 

NS st 
x u = — — + d(u - 1), 

where u = 1, . . . , N. Therefore, we obtain 



i N 



3=1 
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In order to apply the Fast Fourier Transform, note that we must take 

Sr,= N- 

It follows that a small rj, which is needed to achieve greater accuracy in integra- 
tion, will result in values of a; which are relatively far from each other. Therefore, 
to improve the integration precision, we will apply Simpson's rule, i.e. 

Z7T e — * O 



where Sj denotes the Kronecker symbol (recall, that Sj is 1, if j = and is 

ihg 



otherwise). The same reasoning can be applied to fnh(x). The estimate f* h 
can then be computed by noticing that 



r* i \ fnh( x ) Png , f a n\ 

fnhg(x) = J— ^ ^ — -T-W h (x). (4.2) 
x Png 1 Png 

One should keep in mind that even though Wh can be evaluated directly, it is 
preferable to use the Fast Fourier Transform for its computation, thus avoiding 
possible numerical issues, see [12]. Also notice that the direct computation of 
4>emp is rather time-demanding for large samples. One way to avoid this problem 
is to use WARPing, cf. [27]. However, for the purposes of the present study, we 
restricted ourselves to the direct evaluation of 6 em n- 



5. Proofs 



Proof of Theorem 2.1. The proof is elementary and is based on the definition 
of f n h{x). By Fubini's theorem we have 



E 



1 

2^ 



c - ltx <j> emp (t)(l> w (ht) dt 



1 

2^ 



e -<r 2 t 2 /2 

Recalling that 4>y{t) = V + (1 — P)4>f{t)i we obtain 

E [fnh{x)\ = pw h {x) + (1 - p)f * w h (x 

Here we used the facts that 

1 



4>Y(t)(f> w (ht)dt. 



2tt 



1 

2^ 

This concludes the proof. 



e ltx (f> w (ht)dt = w h (x), 



c (t> } {t)(t> w (ht)dt = f*w h {x). 



(5.1) 



□ 



The proof of Theorem 2.2 is based on the following three lemmas, all of which 
are reformulations of results from [46]. 
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Lemma 5.1. Assume Condition 1.2. For h — > and 8 > /iied we have 



(1 - sY^i^e^'^ds - AT(1 + a + 5) (^\ 



l+a+S 

h 2(l+a+6) e a 2 /(2h 2 )^ 
(5.2) 

Proof. We follow the same line of thought as in [46]. Using the substitution 
s = 1 — /i 2 v and the dominated convergence theorem in the one but last step, 
we get 

\l-s) s ^ w (s)e a2s2/{2h2) ds 

rl/h 

la 

•x/h 



h 2 
h 2 

^2+2Q+25 e cr 2 /(2/i 



{h 2 v) 5 d Pw {l~h 2 v)e- 2 ^- h2 ^/^dv 







(h 2 v) a 

y„ (/i 2 v) q 



^2+20+25^/(2^)^ / w «+* e — 2 v dv 
l+a+S 



fa2+2a+26 e cr 2 / ' (2h 2 



1 



AT(a + £ + l). 



The lemma is proved 

Lemma 5.2. Assume Condition 1.2 and let E [X 2 ] < oo. Furthermore, let f 
be defined by (1.6). Then as n — > oo and /i — > 0, 

/l l+2a e aV(2^) ~ E 



□ 

7 /// 



+ (na+l) + o(l))U nh (x)+0 P (h), 

7T \<J J 



where 



3 = 1 



1 

= ^5Z( cos ( 

Proof. We have 



E 



— 

V h 



(5.3) 



e ltx cj> w {ht) _ aH2/2 (j)emp{t)dt 

^ 7r J— oo e 

2^ I' e -^ / V»( S )e CT2s2/(2 ' l2) 0e mp (J)d S 
-Lt / e"(^- s )/V»(s)e CTV /^ 2 ^ S 
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Notice that 



(•( 



Xi-x 



cos 



Xj — x 



Xj - x 



Xj - x 



-2sin(-(s + l) 



cos 

X, — x 



X 3 -x 



Xj-x 



sin -(«-l)H 



X, - x 



where R n j(s) is a remainder term satisfying 



\Rn, 3 \<(\x\+\X 3 



1-s 



, < s < 1. 



(5.4) 



The bound follows from the inequality | sinx| < \x\ 
By Lemma 5.1, f n h{x) equals 



— -— C <p w { S )e° 2 ° 2 /^ds-Ycos 
nh J n j-^ 



X n -x\ 1 



3 = 1 



^(T(a + 1) + o(l)) ^ h ^ e - 2 /(2k 2 )lj2 cos ( 

1 " 

- 22 



Xj - x 



3=1 



3 = 1 



where 



Ri, 



1 1 

nh Jo 



f^-(s)<^(s)e CT s /{2h ' ] ds. 



For the remainder we have, by (5.4) and Lemma 5.1, 



|^,i|<-(N + |^|)T 
7T II 

A 



i r n-s 



Msy° s/(2h) ds 



= -(\x\ + \X j \)(T(a + 2) + o(l))h^ a (-, 



2+a 



^ 2 /(2h 2 ) 



Consequently, 



and 



Var 



n,j 



< E 



R 



^(-Rnj — E R n j ) — Op( — — 
3 = 1 v 



o(h i+Aa e° 2 ' h2 ) 



h2+2a e ^/(2h 2 ) 



which follows from Chebyshev's inequality. Finally, we get 

(/nh(aO-E[/ Bh (aO]) 

l+a 



/jl+2a e CT 2 /(2/i 2 ) 

A 

TT 

and this completes the proof of the lemma 



-(T(a + 1) + o(l)) ( ^ ) U nh (x) + P (h), 



□ 
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The next lemma establishes the asymptotic normality. 

Lemma 5.3. Assume conditions of Lemma 5.2 and let, for a fixed x, U n h(x) 
be defined by (5.3). Then, as n — > oo and h — > 0, 



Unh(x) 



V 



Proof. Write 

For < y < 2ir we have 
P(Yj < y) -- 

OO 

= E 

k— — cx 



OO 



X; 



D '5 



mod 2tt. 



k— — oo 



P(2knh + x < Xj < 2knh + yh + x) 

2kTrh-\-yh-\-x 

i 

2k7rh-\-x 

u)du 



yhq{ik,h) 



k=-oo J -°° 



y_ 

2vr' 



where £k,h is a point in the interval [2knh + x, 2k-nh + yh + x] C [2kTrh + x,2(k + 
1)ttIi+x]. Since h — > 0, the last equivalence follows from a Riemann sum approx- 
imation of the integral and continuity of the density q of X. Consequently, as 
h — > 0, we have Yj — > U, where U is uniformly distributed on the interval [0, 2ir}. 
Since the cosine is bounded and continuous, it then follows by the dominated 
convergence theorem that E [| cos Yj\ a ] — > E [| cosC/| a ], for all a > 0. Therefore 



E 



cos 



A', 



and 



E 



Xj - X 



E[cosC7] = 



E[(cosC/) 5 



To prove asymptotic normality of U n h{x), first note that it is a normalised 
sum of i.i.d. random variables. We will verify that the conditions for asymptotic 
normality in the triangular array scheme of Theorem 7.1.2 in [8] hold (Lya- 
punov's condition). In our case this reduces to the verification of the fact that 



E [| cos Yj - E [cos Yj] | 3 ] E [| cos Y x - E [cos Y x ] | 3 ] 



n 3 / 2 (Var[cosYj]) 3 / 2 



Now notice that 

E[|cosFi -E[cosy!]| 3 ] 



Ti^Varfcosyi]) 3 / 2 



E[|cost/| 3 ] 



0. 



i^CMcosFi]) 3 / 2 n^Varlcost/]) 3 / 2 
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asn^oo. Consequently, U n h is asymptotically normal, 



The following corollary immediately follows from Lemmas 5.2 and 5.3. 
Corollary 5.1. Under the conditions of Lemma 5.2 we have that 



□ 



n l+2a e (7 2 /(2h 2 



(fnh(x)-V[fnh(x)}) °nIo, 



A 2 (T{a+l)) 2 ( 1 
2tt 2 



Now we prove Theorem 2.2. 
Proof of Theorem 2.2. From (1.4) we have that 

f nh (x) - E[f nh (x)] = Y^(fnh(x) - E[f nh (x)]). 

Hence the result follows from Corollary 5.1. 

The following lemma gives the order of the variance of f n h(x). 



□ 



Lemma 5.4. Let Condition 1.2 hold and f n h{x) be defined as in (1.4). Then, 
asn^oo and h — ► 0, 



Var[f nh (x)} = O 



^2(l+2Q) e o- 2 /h 2 



Proof. We have 
Var[f nh (x)] = 



47r 2 (l - p) 2 nh 2 



Var 



Notice that 
-l 



Var 



M^-^/h^^s 2 /^ 2 )^ 



-l 



< 2 



,( s )\e° 2s *f^d S 



Recalling Lemma 5.1, we conclude that 



Var [/„,,(£)] = O 



^2(l+2a) e o- 2 /h 2 



□ 



Next we deal with consistency of p ng and prove Theorem 2.3. 
Proof of Theorem 2.3. We have 

Png~V= (Png ~ E [p ng ]) + (E [p ng ] - p) . 
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To prove that this expression converges to zero in probability, it is sufficient to 
prove that Va,r[p ng ] — and E [p ng ] — P —>■ as n — > oo, g — > 0. We have 



Var[p„ 9 ] = TT 2 g 2 Var 



1 

2^ 



1/9 (t>emp(t)<f>k(9t) 



1/9 



D -a 2 t 2 /2 



eft 



7rVVar[/„ s (0)]. 



Here it is understood that replacing subindex h by g entails replacement of the 
smoothing characteristic function <p w by <pj.. By Lemma 5.4, 



V&r[p ng ] = O 



(5.5) 



This converges to zero due to the condition on g. Furthermore, 



p = p 



fa{t)dt-l) +(1-p) 



1/9 



-1/9 



c^f^M^dt. (5.6) 



The first term here is zero, since <pk integrates to 2, while the second term 
converges to zero, which can be seen upon noticing that (f>k is bounded, (f>f is 
integrable and that this term is bounded by 



l-p 



\<j> f {t)\dt, 



which converges to zero as g —> 0. The last part of the theorem follows from the 
identity 



1/9 



-1/9 



(j> f (t)Mgt)dt = g 1 



1/9 



-1/9 



4>k{gt) 
{gtV 



dt 



and Conditions f.l and 1.3, because Condition 1.3 implies the existence of a 
constant K, such that sup t \cj)k(t)t^ 7 \ < K. □ 

Next we prove asymptotic normality of p ng . 

Proof of Theorem 2.4- The result follows from the definition of p ng and Corol- 
lary 5.1, because p ng = girf ng {0) essentially is a rescaled version of / ns (0). □ 

Now we are ready to prove Theorem 2.5. 

Proof of Theorem 2.5. Write 



>a p a 2 /(2h 2 ) (fnhg( x ) E [fnhg( X )]) 



1 



1 



h l+2a e *y(2hi) \l-p ng 2TT 
1 1 



c _ ltx (j)emp(t)(j)io{ht) ^ 



-<r 2 t 2 /2 



E 



1 - p ng 2n 

i 



c -it X 4>emp{t)4> w (ht) ^ 



-aH 2 12 



(X_\ I Png 

\h) \l-p„ s 



-E 



Png 



l-p 



i "J 



(5.7) 
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We want to prove that the first term is asymptotically normal, while the sec- 
ond term converges to zero in probability. Application of Slutsky's lemma, see 
Lemma 2.8 in [41], will then imply that the above expression is asymptotically 
normal. 

First we deal with the second term. We have 



/ l 2+2a e cr 2 /(2/t 2 



X 

■W I — 



Png 
1 ~ Png 



-»(s) 



- E 



y/n 



Png 



1-P 



i "J 

g 2+2a e a 2 /{2g 2 )' 



f l 2+2a e cr 2 /(2h 2 ) 



Vu 



g 2+2 ae ai/{2gi) \\ - p ng 

Note that Condition 1.4 implies 

l+ct 



E 



1 - Png 



g 2+2a e cr z /(2g 2 ) 



h 2+2a e ay(2h2) ^ 

Next we prove that 



1 + S n 



exp ( -(£„ - ?y„)logn 



ff 2+2a e <rV(2 9 2 ) \1 - p, 



-E 



Pr, 



1 - Png 



(5- 



0. 



(5.9) 



is asymptotically normal. Then (5.8) will converge to zero in probability, since 
convergence to a constant in distribution is equivalent to convergence to the 
same constant in probability and because w is bounded. We have 



(p ng -E\p ng ])^N\0, 



C 2 (r(l + a)) 2 



ff 2+2a et T 2 /(2 S 2 ) 

which can be seen as follows: 

-(png -E[p nff ]) = 



1 



2+2q n 



(5.10) 



g 2+2Q e cr 2 /(2 9 2 



g 2+2a e <T2/(2g2) 
g 2+2a e ^/(2g 2 ) 



{p ng -E[p ng }) 

{_Png Png E [png 



Png})- 



Due to Theorem 2.4 the first term here yields the asymptotic normality. We 
will prove that the second term converges to zero in probability. To this end it 
is sufficient to prove that 



Var 



g 2-\-2oL 

It follows from the definition of p ng and Lemma 5.1 that 

Var [p ng -p ng ] < E [(1 - £n -Pn 9 ) 2 l[p„ s >l- e „]] 

< (2 + 2^V (1+Q) e CT2/92 ) P( Png > 1 - Bn ), (5.12) 



3 o- 2 /(2g 2 ) ^ P "9 P"9) 



g 4+4 ae *y g i 



Var [p ng — Png ]->0. (5.11) 
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where K is some constant. This and (5.11) imply that we have to prove 

nP(p ng > l-e„) (5.13) 

Now 

P{Png > 1 - En) = P{Png ~ E [p„g] > 1 — £„ — E [p ng ]). (5.14) 

Denote t„ = 1 — e n — E [p ns ] and select no so large that for n > no, we have 
t„ > 0. Notice that i n — > 1 — p, which follows from (5.6). The probability 
in (5.14) is bounded by P(\p ng — E [p n9 ]| > t n ). Note that 



n 1 

E-irkg 
n 

3=1 



with 

By Lemma 5.1, which is applicable in view of Condition 1.3, 

l nkg (—) (5.15) 
n V 9 J 

is bounded by a constant K, say, times g 2+2a e a '^ 29 '^n~ 1 . Hocffding's inequality, 
see [29], then yields 

/ 2t 2 n \ 

P(\p ng -E\p ng }\ > tn) < 2cxp (^-^| g2(2+2a)ej2/g2 j • (5.16) 

Since now 

2tl n 



n 



P(\p ng - E [p ng \ \ > t n ) < 2n exp 



K2 g 2(2+2a) e a 2 /g 



2 / n 2 I ) 



it is enough to prove that the term on the right-hand side converges to zero. 
Taking the logarithm yields 

log 2 + log n 



J(2 g 2(2+2a) e a 2 /g 2 ' 

This diverges to minus infinity, because the last term dominates logn, 

n 1 



5 2(2+2a) eCT Vs 2 logn 



The latter fact can be seen by taking the logarithm of the left-hand side and 
using (1.18). We obtain 



log n - (2 + 2a) log g - log log n 



,. n 2 _ 

<5„logn+ (l + 2a) log log n - (2 + 2a) log er 2 + (2 + 2a) log(l + S n ) -> oo, 
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which follows from (1.18). This in turn proves that (5.10) is asymptotically 
normal. Since the derivative (y/(l — y))' ^= 0, a minor variation of the 5-mcthod 
then implies that (5.9) is also asymptotically normal (see Theorem 3.8 in [41] 
for the (5-method). Consequently, the second term in (5.7) converges to zero in 
probability. 

We now consider the first term in (5.7) and want to prove that it is asymp- 
totically normal. Rewrite this term as 



1 1 



^1+2^/(2^) \ l-p2<K 

1 1 

E 



1 -p2ir 
1 



c - itx 4>e mp {t)4>w{ht) ^ 



c -itx <t>emp{t)4>w{ht) ^ 



-<T 2 t 2 /2 



1 



1 -p) 27T 



E 



1 -Pr, 



1 - p J 2?r 



OO 



itx <t>em P (t)<l) w {ht) , 
e „-o- 2 t 2 /2 



Thanks to Corollary 5.1 the first summand here is asymptotically normal. We 
will prove that the second term vanishes in probability. Due to Chebyshev's 
inequality, it is sufficient to study the behaviour of 



h l+2a e a 2 /(2h 2 ) 



E 



Png - P 



1 



{t)4> w {ht) 

e -oH*/2 



(It 



(l-p ng )(l- P ) 27T 

By the Cauchy-Schwarz inequality, after taking squares, we can instead consider 

(Png - Pf 



/ l 2(l+2a) e <T 2 //i : 



Notice that 



■E 



{i-p ng y{i- P y 



x E 



1 

2^ 



c _ ltx (j)e mp (t)(j) w (ht) ^ 



(5.17) 



E 



1 

2tt 



c -itx ^emp(t)(f> w {ht) ^ 



= Var 



g-<r 2 t 2 /2 

" 1 

27 



c -Ux ^emp{t)(j) w (ht) ^ 



-<r 2 t 2 /2 



E 



1 

27 



■ remp 



dt 



It is easy to see that this expression is of order hr 2 . Indeed, due to Lemma 5.4 
the first term in this expression is of order n^ 1 /i 2 ' 1+2Q ^e <T l h . The fact that 
this in turn is of lower order than hr" 1 can be seen in the same way as we did 
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with (5.6). For the second term we have 



E 



1 

2^ 



c -itx <t>emp(t)4>w(ht) ^ 



-CT 2 t 2 /2 



and this is of order h 2 because 



1 r 



c cl> f (t)(j> w (ht)dt 



<(1-P)^/ \^f(t)\dt 



and because w is bounded. Consequently, taking into account (5.17), we have 
to study 



/ l 4+4a e o- 2 /'! : 



-E 



(Png ~ PY 



(i-p na ra-p) 2 



or 



(1- 



p \2 £ 2/ l 2(2+2a) e <r 2 /'i 2 E P ^ ' 



(5.18) 



since (1 — p ng ) 2 < e„ 2 . Now 

E [(Png ~ P) 2 ] < 2E [(p ng - Png f] + 2E [{p ng - pf]. 
Hence we have to prove that 

e 2/j2(2+2a) e( T 2 /'» 2 E ~~ P ™ 9 ^ ~^ ' 

-E[( P)19 -p) 2 ] ^0. 



£ 2 ^2(2+2a) e a 2 //i 



(5.19) 
(5.20) 



The first fact essentially follows from the arguments concerning (5.11), since 
the presence of an additional factor e~ 2 given Condition 1.5 does not affect the 
arguments used. Indeed, (5.19) will hold true, if we prove that 



1 



~P(Png > 1 - £«) -> 0. 



Here we used the definitions of p ng and p ng and Lemma 5.1. Now notice that 
{g/h) A+Aa — > 1, which follows from Condition 1.4 and that by arguments con- 
cerning (5.13) we have nP(p ng > 1 — e„) — > 0. Moreover, under Conditions 1.4 
and 1.5 we have e~ 2 e cr / 2 ^ l / g -^l h ) _> q, which can again be seen by taking the 
logarithm and verifying that it diverges to minus infinity. This proves (5.19). 
Next we will prove (5.20). Notice that the latter is in turn implied by 



e 2/ l 2(2+2a) e <r 2 // l 2 Var b'«/] + e 2/ l 2(2+2a) e cr 2 // i 2 ( E ^ >n 9 P ^ ~* ' 
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The first term here converges to zero by (5.5) and Conditions 1.4 and 1.5. Now 
we turn to the second term. Taking into account (5.6), we have to study the 
behaviour of 

This can be rewritten as 

/ VTi g 2+2a e ^/(2 g 2 ) \ ^ ,1 ft\ 

The factor between the brackets in this expression converges to zero. Therefore 
it is sufficient to consider 



Rewrite this as 



5 l+2a e aV(2 S 2 ) y J_ 1/g ^ > ( fft )7 

Conditions 1.1, 1.3, 1.4 and 1.5 imply that this expression converges to zero, 
because the integral converges to a constant by the dominated convergence 
theorem, while 

^ 9 7^ 

5 l+2a eCT V(2s 2 ) y 

which can be seen by taking the logarithm and noticing that it diverges to minus 
infinity. We obtain 

1 fj^ 6 

-logn+ (7 - 1 - 2a)log.g - — = -y logn + (7 - 1 - 2a) log a 

l + 2a-7, . , l + 2a-7, , 
+ 2 log(l + d n ) H log log n 

< (7 — 1 — 2a) log a 
+ ^ L log(l + (5„) + -loglogn^ -00, (5.21) 

which follows from the facts that 5 n > and 1 + 2a — 7 < 0. Combination of 
all these intermediary results completes the proof of the theorem. □ 

Proof of Theorem 2. 6. Write 

E [t nhg {x)] - f{x) = {E [f* hg (x) - f nh (x)}} + {E [f nh (x)\ - f{x)}. (5.22) 

Because of (1.7), the second summand in this expression vanishes as h — > 0. 
Next we consider the first summand in (5.22). Using the definitions of fnh g ( x ) 
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and fnh(x), we get 

E [f* hg {x) - f nh {x)] = E 



Png - P 



(l-p ng )(l-p) 2lX J_ 



c -itx <f>emp(t)<f>w(ht) ^ 



-<J 2 t 2 /2 



E 



Png - P 



(! -Pn fl )(l ~P) 



(5.23) 



By the Cauchy-Schwarz inequality the absolute value of the first summand in 
this expression is bounded by 




— itx (Pemp 



1/2 



(5.24) 



The fact that this term converges to zero follows from (5.17) and subsequent 
arguments in the proof of Theorem 2.5. 

Now we have to study the second summand in (5.23). By the Cauchy-Schwarz 



inequality and the fact that (1 
1 1 



Png) 



< C 



(l-p) 2 ej 



1 fx 



it suffices to consider 



V[(Png-p) 2 } 



instead. The fact that this term converges to zero follows from the arguments 
concerning (5.18), which were given in the proof of Theorem 2.5. Indeed, the 
expression above can be rewritten as 



x 

''ft 



1 



(l-p) 2 e„ 2 /i 2 ( 2 + 2Q )e CT2 / 



h 2 



mPng-p) 2 } 



l l 2{l+2a) e a 2 /h 2 



Now use arguments concerning (5.19), (5.20) and the facts that w is a bounded 
function and under Condition 1.4 we have / l 2 ( 1 + 2a ) e °' n -l _ > o. This con- 
cludes the proof of the first part of the theorem. 

Now we prove the second part, an order expansion of the bias E [fnhg( x )] ~ 
f(x) under additional assumptions given in the statement of the theorem. The 
proof follows the same steps as the proof of the first part of the theorem. Notice 
that under the condition / e H(/3,L), the second summand in (5.22) is of 
order ft,' 3 , see Proposition 1.2 in [40]. We have to show then that ft/ 3 times 



-l-2a -a 2 /(2h 2 ) 



converges to zero. To this end it is sufficient to show that 



log (h p - 



l-2a 



-a 2 /(2h 2 ) 



This essentially follows from the same argument as (5.21) (with 7 replaced by 
f3). Now consider (5.23). Its first term is bounded by (5.24) and we have to show 
that this term multiplied by y / ?ift,~ 1 ~ 2Q e~ cr ^ 2h ) tends to zero. The arguments 
from the proof of Theorem 2.6 lead us to (5.18) and hence the desired result. □ 
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Proof of Theorem 2.7. The result is a direct consequence of Theorems 2.5 
and 2.6. □ 
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